135 research outputs found

    Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

    Full text link
    High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System

    Noise-Adaptive Compiler Mappings for Noisy Intermediate-Scale Quantum Computers

    Full text link
    A massive gap exists between current quantum computing (QC) prototypes, and the size and scale required for many proposed QC algorithms. Current QC implementations are prone to noise and variability which affect their reliability, and yet with less than 80 quantum bits (qubits) total, they are too resource-constrained to implement error correction. The term Noisy Intermediate-Scale Quantum (NISQ) refers to these current and near-term systems of 1000 qubits or less. Given NISQ's severe resource constraints, low reliability, and high variability in physical characteristics such as coherence time or error rates, it is of pressing importance to map computations onto them in ways that use resources efficiently and maximize the likelihood of successful runs. This paper proposes and evaluates backend compiler approaches to map and optimize high-level QC programs to execute with high reliability on NISQ systems with diverse hardware characteristics. Our techniques all start from an LLVM intermediate representation of the quantum program (such as would be generated from high-level QC languages like Scaffold) and generate QC executables runnable on the IBM Q public QC machine. We then use this framework to implement and evaluate several optimal and heuristic mapping methods. These methods vary in how they account for the availability of dynamic machine calibration data, the relative importance of various noise parameters, the different possible routing strategies, and the relative importance of compile-time scalability versus runtime success. Using real-system measurements, we show that fine grained spatial and temporal variations in hardware parameters can be exploited to obtain an average 2.92.9x (and up to 1818x) improvement in program success rate over the industry standard IBM Qiskit compiler.Comment: To appear in ASPLOS'1

    Full-Stack, Real-System Quantum Computer Studies: Architectural Comparisons and Design Insights

    Full text link
    In recent years, Quantum Computing (QC) has progressed to the point where small working prototypes are available for use. Termed Noisy Intermediate-Scale Quantum (NISQ) computers, these prototypes are too small for large benchmarks or even for Quantum Error Correction, but they do have sufficient resources to run small benchmarks, particularly if compiled with optimizations to make use of scarce qubits and limited operation counts and coherence times. QC has not yet, however, settled on a particular preferred device implementation technology, and indeed different NISQ prototypes implement qubits with very different physical approaches and therefore widely-varying device and machine characteristics. Our work performs a full-stack, benchmark-driven hardware-software analysis of QC systems. We evaluate QC architectural possibilities, software-visible gates, and software optimizations to tackle fundamental design questions about gate set choices, communication topology, the factors affecting benchmark performance and compiler optimizations. In order to answer key cross-technology and cross-platform design questions, our work has built the first top-to-bottom toolflow to target different qubit device technologies, including superconducting and trapped ion qubits which are the current QC front-runners. We use our toolflow, TriQ, to conduct {\em real-system} measurements on 7 running QC prototypes from 3 different groups, IBM, Rigetti, and University of Maryland. From these real-system experiences at QC's hardware-software interface, we make observations about native and software-visible gates for different QC technologies, communication topologies, and the value of noise-aware compilation even on lower-noise platforms. This is the largest cross-platform real-system QC study performed thus far; its results have the potential to inform both QC device and compiler design going forward.Comment: Preprint of a publication in ISCA 201

    On Optimizing Distributed Tucker Decomposition for Dense Tensors

    Full text link
    The Tucker decomposition expresses a given tensor as the product of a small core tensor and a set of factor matrices. Apart from providing data compression, the construction is useful in performing analysis such as principal component analysis (PCA)and finds applications in diverse domains such as signal processing, computer vision and text analytics. Our objective is to develop an efficient distributed implementation for the case of dense tensors. The implementation is based on the HOOI (Higher Order Orthogonal Iterator) procedure, wherein the tensor-times-matrix product forms the core routine. Prior work have proposed heuristics for reducing the computational load and communication volume incurred by the routine. We study the two metrics in a formal and systematic manner, and design strategies that are optimal under the two fundamental metrics. Our experimental evaluation on a large benchmark of tensors shows that the optimal strategies provide significant reduction in load and volume compared to prior heuristics, and provide up to 7x speed-up in the overall running time.Comment: Preliminary version of the paper appears in the proceedings of IPDPS'1

    Formal Constraint-based Compilation for Noisy Intermediate-Scale Quantum Systems

    Full text link
    Noisy, intermediate-scale quantum (NISQ) systems are expected to have a few hundred qubits, minimal or no error correction, limited connectivity and limits on the number of gates that can be performed within the short coherence window of the machine. The past decade's research on quantum programming languages and compilers is directed towards large systems with thousands of qubits. For near term quantum systems, it is crucial to design tool flows which make efficient use of the hardware resources without sacrificing the ease and portability of a high-level programming environment. In this paper, we present a compiler for the Scaffold quantum programming language in which aggressive optimization specifically targets NISQ machines with hundreds of qubits. Our compiler extracts gates from a Scaffold program, and formulates a constrained optimization problem which considers both program characteristics and machine constraints. Using the Z3 SMT solver, the compiler maps program qubits to hardware qubits, schedules gates, and inserts CNOT routing operations while optimizing the overall execution time. The output of the optimization is used to produce target code in the OpenQASM language, which can be executed on existing quantum hardware such as the 16-qubit IBM machine. Using real and synthetic benchmarks, we show that it is feasible to synthesize near-optimal compiled code for current and small NISQ systems. For large programs and machine sizes, the SMT optimization approach can be used to synthesize compiled code that is guaranteed to finish within the coherence window of the machine.Comment: Invited paper in Special Issue on Quantum Computer Architecture: a full-stack overview, Microprocessors and Microsystem

    Byzantine-Resilient Federated Learning with Heterogeneous Data Distribution

    Full text link
    For mitigating Byzantine behaviors in federated learning (FL), most state-of-the-art approaches, such as Bulyan, tend to leverage the similarity of updates from the benign clients. However, in many practical FL scenarios, data is non-IID across clients, thus the updates received from even the benign clients are quite dissimilar. Hence, using similarity based methods result in wasted opportunities to train a model from interesting non-IID data, and also slower model convergence. We propose DiverseFL to overcome this challenge in heterogeneous data distribution settings. Rather than comparing each client's update with other client updates to detect Byzantine clients, DiverseFL compares each client's update with a guiding update of that client. Any client whose update diverges from its associated guiding update is then tagged as a Byzantine node. The FL server in DiverseFL computes the guiding update in every round for each client over a small sample of the client's local data that is received only once before start of the training. However, sharing even a small sample of client's data with the FL server can compromise client's data privacy needs. To tackle this challenge, DiverseFL creates a Trusted Execution Environment (TEE)-based enclave to receive each client's sample and to compute its guiding updates. TEE provides a hardware assisted verification and attestation to each client that its data is not leaked outside of TEE. Through experiments involving neural networks, benchmark datasets and popular Byzantine attacks, we demonstrate that DiverseFL not only performs Byzantine mitigation quite effectively, it also almost matches the performance of OracleSGD, where the server only aggregates the updates from the benign clients

    Effect of Fibres on the behaviour of Bottle shaped strut

    Get PDF
    Bottle-shaped struts are the critical elements in the design of D-regions using Strut- and-Tie method. The transverse tension develops due to dispersion of compression load which further leads to splitting crack in bottle shaped strut. Due to this, the strut fails before reaching its ultimate compression capacity. Higher resistance to the transverse tension can improve the strut capacity and its efficiency in load transfer. Transverse tensile stress can be resisted by providing the steel reinforcement or through the addition of discrete fibres in the concrete. Many international codes like ACI, AASTHO, and CSA have suggested guidelines about the addition of steel fibre reinforcement in the bottle-shaped struts for resisting the transverse tension. Still, the influence of the discrete fibres on the performance of the bottle-shaped strut is not well established. The performance of the bottle-shaped strut in terms of efficiency factors, crack pattern and failure mode for different amounts of macro steel fibres and micro polypropylene fibres are studied using experimental investigation. Specimens of 600mm x 600mmx 100mm size were tested under compression. Steel fibres are added in the proportions of 0.7%, 0.9% and 1.1% volume fractions to the concrete. Effect of fibre hybridization also studied by adding micro polypropylene fibres in the proportions of 1% and 2% in addition to the steel fibres. Experimental results showed that adding discrete fibres in concrete significantly improved the resistance to the transverse tension in bottle-shaped struts and led to the increased load-carrying capacity of the specimens. A 75% improvement in the efficiency factor is observed at 0.9% volume of steel fibre addition. Addition of micro polypropylene fibres to the macro steel fibres further enhanced the load-carrying capacity of the bottle-shaped struts. Microfibres in the concrete effectively arrested the micro-cracks and delayed the occurrence of a first splitting crack in the strut region. Due to this the mode of failure changed to ductile through the formation of a greater number of small cracks with less crack width at the ultimate load. Results of this study clearly show that the addition of discrete fibres to the concrete is an effective solution to improve the performance of the bottle-shaped struts in terms of ultimate strength and serviceability. © 2021 Institute of Physics Publishing

    Optical detection of the structural properties of tumor tissue generated by xenografting of drug-sensitive and drug-resistant cancer cells using partial wave spectroscopy (PWS)

    Get PDF
    mesoscopic physics-based optical imaging technique, partial wave spectroscopy (PWS), has been used for the detection of cancer by probing nanoscale structural alterations in cells/tissue. The development of drug-resistant cancer cells/tissues during chemotherapy is a major challenge in cancer treatment. In this paper, using a mouse model and PWS, the structural properties of tumor tissue grown in 3D structures by xenografting drug-resistant and drug-sensitive human prostate cancer cells having 2D structures, are studied. The results show that the 3D xenografted tissues maintain a similar hierarchy of the degree of structural disorder properties as that of the 2D original drug-sensitive and drug-resistant cells
    corecore